MiningMart: Sharing Successful KDD Processes

نویسندگان

  • Timm Euler
  • Katharina Morik
  • Martin Scholz
چکیده

ion: Metadata are given at different levels of abstraction, a conceptual and a relational level. This makes an abstract case understandable and re-usable. Data and Case documentation: The database objects (tables or views) as well as their conceptual counterparts are declaratively stored. So is the chain of preprocessing operations, including all operators’ parameter settings etc. All entities can be given explaining names and further free text descriptions. Thus all details of a http://mmart.cs.uni-dortmund.de/ case can be restored, modified and executed again at any time. Ease of case adaptation: In order to run a given sequence of operators on a new database, only the relational metadata and their mapping to the conceptual metadata have to be written. 2 The MiningMart Meta Model, M4 In order to exchange successful knowledge discovery cases, a formalism to describe them in abstract, yet operational terms proved to be adequate. A conceptual data model is used to reference data by common everyday notions rather than by specific database names. All data transactions are described at this level. Of course, this conceptual level must be mapped to the actual data, which is described by the relational data model. After that users only work with the convenient conceptual model. The formalism behind the conceptual data model is an ontology. It introduces concepts and relationships between them. To further organise concepts and relationships the conceptual data model offers the opportunity to use inheritance. Examples for concepts are Customer and Product; they might be connected by a relationship called Buys. Customer could have subconcepts like Private Customer and Business Customer. All these objects can be considered a part of an abstract model of the application domain, as concepts allow to bundle information from different tables. Given a conceptual data model, a graphical tool supports the creation of a mapping of the involved entities to the corresponding database objects (the relational model). The next step is to implement operators that perform data transformations such as discretisation, treatment of null values, aggregation of attributes into a new one, or collecting sequences from time-stamped data. The sequence of operators is called the case model. Setting up or adjusting cases is supported in MiningMart by a special graphical editor. Together, the three models (conceptual and relational data model, and case model) form the MiningMart meta model, M4. Thus the same formalism is used for the metadata and for the description of the KDD process. M4 is stored in separate tables of the relational database used. The system is designed such that new operators can easily be integrated. By modelling real world cases in future applications, further useful operators will be identified, implemented and added to the repository. To ease the process of editing cases, applicability constraints on the basis of metadata are provided as formalised knowledge and are automatically checked by the human computer interface. In this way only valid sequences of steps can be produced by a case designer. From the case model of the operator chains, the MiningMart compiler creates SQL code that performs the data processing steps as specified by the chains (see section 3). 3 MiningMart Components MiningMart has got three main components: the metadata model M4, which is explained in the previous section; the compiler, which makes a case executable; and the graphical user editors. A fourth part of the project is the web platform which allows to exchange cases. The compiler, the editors and the web platform are described in the following sections 3.1, 3.2 and 3.3, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Churn Prediction in Telecommunications Using MiningMart

This paper summarises a successful application of Knowledge Discovery in Databases (KDD) in an Italian telecommunications research lab. The aim of the application was to predict customer churn behaviour. A critical success factor for this application was clever preprocessing of the given data, in particular the construction of derived predictor features. The application was realised in the Mini...

متن کامل

The MiningMart Approach to Knowledge Discovery in Databases

Although preprocessing is one of the key issues in data analysis, it is still common practice to address this task by manually entering SQL statements and using a variety of stand-alone tools. The results are not properly documented and hardly re-usable. The MiningMart system presented in this chapter focusses on setting up and re-using best-practice cases of preprocessing data stored in very l...

متن کامل

Collaborative management of a repository of KDD processes

Knowledge Discovery in Databases (KDD) is a complex and computationally intensive process, that requires a repeated interaction between tools and users, often in a distributed environment. Given the complexity of the process, both naïve and expert users need some support to effectively perform knowledge discovery. In this paper we present a userand knowledge-centric approach to support the desi...

متن کامل

Providing User-Support in Performing Knowledge Discovery in Databases

Knowledge Management (KM) is becoming a success factor for industrial organisations. Obtaining control over and gaining information out of data helps to achieve the organisation’s goals more effectively. Thus knowledge (or information) becomes a very important resource. This resource must be adequately procured, stored, processed and communicated. These tasks are central points of Knowledge (an...

متن کامل

PCSE-KDD: A Process-Centered Support Environment for the Knowledge Discovery Processes

Current support for Knowledge Discovery in Databases (KDD) is provided only for fragments of the process, a particular KDD process model, or most recently certain process aspects. The support needed for a KDD process varies greatly based on the specifications of the concrete KDD process, and cannot be based purely on a generic process model. There is a need for a more comprehensive support appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003